Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Comput Struct Biotechnol J ; 20: 1413-1426, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35386103

RESUMEN

The recent advancements in toxicogenomics have led to the availability of large omics data sets, representing the starting point for studying the exposure mechanism of action and identifying candidate biomarkers for toxicity prediction. The current lack of standard methods in data generation and analysis hampers the full exploitation of toxicogenomics-based evidence in regulatory risk assessment. Moreover, the pipelines for the preprocessing and downstream analyses of toxicogenomic data sets can be quite challenging to implement. During the years, we have developed a number of software packages to address specific questions related to multiple steps of toxicogenomics data analysis and modelling. In this review we present the Nextcast software collection and discuss how its individual tools can be combined into efficient pipelines to answer specific biological questions. Nextcast components are of great support to the scientific community for analysing and interpreting large data sets for the toxicity evaluation of compounds in an unbiased, straightforward, and reliable manner. The Nextcast software suite is available at: ( https://github.com/fhaive/nextcast).

2.
Proc Natl Acad Sci U S A ; 117(52): 33474-33485, 2020 12 29.
Artículo en Inglés | MEDLINE | ID: mdl-33318199

RESUMEN

Contact dermatitis tremendously impacts the quality of life of suffering patients. Currently, diagnostic regimes rely on allergy testing, exposure specification, and follow-up visits; however, distinguishing the clinical phenotype of irritant and allergic contact dermatitis remains challenging. Employing integrative transcriptomic analysis and machine-learning approaches, we aimed to decipher disease-related signature genes to find suitable sets of biomarkers. A total of 89 positive patch-test reaction biopsies against four contact allergens and two irritants were analyzed via microarray. Coexpression network analysis and Random Forest classification were used to discover potential biomarkers and selected biomarker models were validated in an independent patient group. Differential gene-expression analysis identified major gene-expression changes depending on the stimulus. Random Forest classification identified CD47, BATF, FASLG, RGS16, SYNPO, SELE, PTPN7, WARS, PRC1, EXO1, RRM2, PBK, RAD54L, KIFC1, SPC25, PKMYT, HISTH1A, TPX2, DLGAP5, TPX2, CH25H, and IL37 as potential biomarkers to distinguish allergic and irritant contact dermatitis in human skin. Validation experiments and prediction performances on external testing datasets demonstrated potential applicability of the identified biomarker models in the clinic. Capitalizing on this knowledge, novel diagnostic tools can be developed to guide clinical diagnosis of contact allergies.


Asunto(s)
Biomarcadores/metabolismo , Dermatitis Alérgica por Contacto/diagnóstico , Dermatitis Irritante/diagnóstico , Aprendizaje Automático , Adulto , Algoritmos , Alérgenos , Bases de Datos Genéticas , Dermatitis Alérgica por Contacto/genética , Dermatitis Irritante/genética , Diagnóstico Diferencial , Femenino , Regulación de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Irritantes , Leucocitos/metabolismo , Masculino , Pruebas del Parche , Reproducibilidad de los Resultados , Índice de Severidad de la Enfermedad , Piel/patología , Transcriptoma/genética
3.
Adv Sci (Weinh) ; 7(22): 2002221, 2020 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-33240770

RESUMEN

Despite considerable efforts, the properties that drive the cytotoxicity of engineered nanomaterials (ENMs) remain poorly understood. Here, the authors inverstigate a panel of 31 ENMs with different core chemistries and a variety of surface modifications using conventional in vitro assays coupled with omics-based approaches. Cytotoxicity screening and multiplex-based cytokine profiling reveals a good concordance between primary human monocyte-derived macrophages and the human monocyte-like cell line THP-1. Proteomics analysis following a low-dose exposure of cells suggests a nonspecific stress response to ENMs, while microarray-based profiling reveals significant changes in gene expression as a function of both surface modification and core chemistry. Pathway analysis highlights that the ENMs with cationic surfaces that are shown to elicit cytotoxicity downregulated DNA replication and cell cycle responses, while inflammatory responses are upregulated. These findings are validated using cell-based assays. Notably, certain small, PEGylated ENMs are found to be noncytotoxic yet they induce transcriptional responses reminiscent of viruses. In sum, using a multiparametric approach, it is shown that surface chemistry is a key determinant of cellular responses to ENMs. The data also reveal that cytotoxicity, determined by conventional in vitro assays, does not necessarily correlate with transcriptional effects of ENMs.

4.
Bioinformatics ; 36(9): 2932-2933, 2020 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-31950985

RESUMEN

MOTIVATION: The analysis of dose-dependent effects on the gene expression is gaining attention in the field of toxicogenomics. Currently available computational methods are usually limited to specific omics platforms or biological annotations and are able to analyse only one experiment at a time. RESULTS: We developed the software BMDx with a graphical user interface for the Benchmark Dose (BMD) analysis of transcriptomics data. We implemented an approach based on the fitting of multiple models and the selection of the optimal model based on the Akaike Information Criterion. The BMDx tool takes as an input a gene expression matrix and a phenotype table, computes the BMD, its related values, and IC50/EC50 estimations. It reports interactive tables and plots that the user can investigate for further details of the fitting, dose effects and functional enrichment. BMDx allows a fast and convenient comparison of the BMD values of a transcriptomics experiment at different time points and an effortless way to interpret the results. Furthermore, BMDx allows to analyse and to compare multiple experiments at once. AVAILABILITY AND IMPLEMENTATION: BMDx is implemented as an R/Shiny software and is available at https://github.com/Greco-Lab/BMDx/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Benchmarking , Biología Computacional , Programas Informáticos , Toxicogenética , Transcriptoma
5.
Part Fibre Toxicol ; 16(1): 28, 2019 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-31277695

RESUMEN

BACKGROUND: Copper oxide (CuO) nanomaterials are used in a wide range of industrial and commercial applications. These materials can be hazardous, especially if they are inhaled. As a result, the pulmonary effects of CuO nanomaterials have been studied in healthy subjects but limited knowledge exists today about their effects on lungs with allergic airway inflammation (AAI). The objective of this study was to investigate how pristine CuO modulates allergic lung inflammation and whether surface modifications can influence its reactivity. CuO and its carboxylated (CuO COOH), methylaminated (CuO NH3) and PEGylated (CuO PEG) derivatives were administered here on four consecutive days via oropharyngeal aspiration in a mouse model of AAI. Standard genome-wide gene expression profiling as well as conventional histopathological and immunological methods were used to investigate the modulatory effects of the nanomaterials on both healthy and compromised immune system. RESULTS: Our data demonstrates that although CuO materials did not considerably influence hallmarks of allergic airway inflammation, the materials exacerbated the existing lung inflammation by eliciting dramatic pulmonary neutrophilia. Transcriptomic analysis showed that CuO, CuO COOH and CuO NH3 commonly enriched neutrophil-related biological processes, especially in healthy mice. In sharp contrast, CuO PEG had a significantly lower potential in triggering changes in lungs of healthy and allergic mice revealing that surface PEGylation suppresses the effects triggered by the pristine material. CONCLUSIONS: CuO as well as its functionalized forms worsen allergic airway inflammation by causing neutrophilia in the lungs, however, our results also show that surface PEGylation can be a promising approach for inhibiting the effects of pristine CuO. Our study provides information for health and safety assessment of modified CuO materials, and it can be useful in the development of nanomedical applications.


Asunto(s)
Cobre/toxicidad , Nanopartículas/toxicidad , Infiltración Neutrófila/efectos de los fármacos , Neumonía/inducido químicamente , Polietilenglicoles/química , Transcriptoma/efectos de los fármacos , Animales , Cobre/química , Femenino , Perfilación de la Expresión Génica , Estudio de Asociación del Genoma Completo , Ratones Endogámicos BALB C , Nanopartículas/química , Ovalbúmina/inmunología , Neumonía/genética , Neumonía/inmunología , Neumonía/patología , Propiedades de Superficie
6.
Source Code Biol Med ; 14: 1, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30728855

RESUMEN

BACKGROUND: Application of microarrays in omics technologies enables quantification of many biomolecules simultaneously. It is widely applied to observe the positive or negative effect on biomolecule activity in perturbed versus the steady state by quantitative comparison. Community resources, such as Bioconductor and CRAN, host tools based on R language that have become standard for high-throughput analytics. However, application of these tools is technically challenging for generic users and require specific computational skills. There is a need for intuitive and easy-to-use platform to process omics data, visualize, and interpret results. RESULTS: We propose an integrated software solution, eUTOPIA, that implements a set of essential processing steps as a guided workflow presented to the user as an R Shiny application. CONCLUSIONS: eUTOPIA allows researchers to perform preprocessing and analysis of microarray data via a simple and intuitive graphical interface while using state of the art methods.

7.
BMC Bioinformatics ; 20(1): 79, 2019 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-30767762

RESUMEN

BACKGROUND: Functional annotation of genes is an essential step in omics data analysis. Multiple databases and methods are currently available to summarize the functions of sets of genes into higher level representations, such as ontologies and molecular pathways. Annotating results from omics experiments into functional categories is essential not only to understand the underlying regulatory dynamics but also to compare multiple experimental conditions at a higher level of abstraction. Several tools are already available to the community to represent and compare functional profiles of omics experiments. However, when the number of experiments and/or enriched functional terms is high, it becomes difficult to interpret the results even when graphically represented. Therefore, there is currently a need for interactive and user-friendly tools to graphically navigate and further summarize annotations in order to facilitate results interpretation also when the dimensionality is high. RESULTS: We developed an approach that exploits the intrinsic hierarchical structure of several functional annotations to summarize the results obtained through enrichment analyses to higher levels of interpretation and to map gene related information at each summarized level. We built a user-friendly graphical interface that allows to visualize the functional annotations of one or multiple experiments at once. The tool is implemented as a R-Shiny application called FunMappOne and is available at https://github.com/grecolab/FunMappOne . CONCLUSION: FunMappOne is a R-shiny graphical tool that takes in input multiple lists of human or mouse genes, optionally along with their related modification magnitudes, computes the enriched annotations from Gene Ontology, Kyoto Encyclopedia of Genes and Genomes, or Reactome databases, and reports interactive maps of functional terms and pathways organized in rational groups. FunMappOne allows a fast and convenient comparison of multiple experiments and an easy way to interpret results.


Asunto(s)
Biología Computacional/métodos , Gráficos por Computador , Bases de Datos Factuales , Ontología de Genes , Genes , Anotación de Secuencia Molecular , Programas Informáticos , Animales , Humanos , Ratones
8.
Data Brief ; 19: 1046-1057, 2018 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-30228994

RESUMEN

We present data derived from an exposure experiment in which three cell-lines representative of cell types of the respiratory tissue (epithelial type-I A549, epithelial type-II BEAS-2B, and macrophage THP-1) have been exposed to ten different carbon-based nanomaterials for 48 h. In particular, we provide: genome-wide mRNA and miRNA expression, and DNA methylation; gene tables, containing information on the aberrations induced in these three genomic data layers at the gene level; mechanism of action (MOA) maps representing the comparative functional alteration induced in each cell line and each exposure.

9.
Bioinformatics ; 34(12): 2136-2138, 2018 06 15.
Artículo en Inglés | MEDLINE | ID: mdl-29425308

RESUMEN

Summary: Detecting and interpreting responsive modules from gene expression data by using network-based approaches is a common but laborious task. It often requires the application of several computational methods implemented in different software packages, forcing biologists to compile complex analytical pipelines. Here we introduce INfORM (Inference of NetwOrk Response Modules), an R shiny application that enables non-expert users to detect, evaluate and select gene modules with high statistical and biological significance. INfORM is a comprehensive tool for the identification of biologically meaningful response modules from consensus gene networks inferred by using multiple algorithms. It is accessible through an intuitive graphical user interface allowing for a level of abstraction from the computational steps. Availability and implementation: INfORM is freely available for academic use at https://github.com/Greco-Lab/INfORM. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Expresión Génica , Redes Reguladoras de Genes , Programas Informáticos , Algoritmos
10.
NanoImpact ; 11: 99-108, 2018 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-32140619

RESUMEN

New strategies to characterize the effects of engineered nanomaterials (ENMs) based on omics technologies are emerging. However, given the intricate interplay of multiple regulatory layers, the study of a single molecular species in exposed biological systems might not allow the needed granularity to successfully identify the pathways of toxicity (PoT) and, hence, portraying adverse outcome pathways (AOPs). Moreover, the intrinsic diversity of different cell types composing the exposed organs and tissues in living organisms poses a problem when transferring in vivo experimentation into cell-based in vitro systems. To overcome these limitations, we have profiled genome-wide DNA methylation, mRNA and microRNA expression in three human cell lines representative of relevant cell types of the respiratory system, A549, BEAS-2B and THP-1, exposed to a low dose of ten carbon nanomaterials (CNMs) for 48 h. We applied advanced data integration and modelling techniques in order to build comprehensive regulatory and functional maps of the CNM effects in each cell type. We observed that different cell types respond differently to the same CNM exposure even at concentrations exerting similar phenotypic effects. Furthermore, we linked patterns of genomic and epigenomic regulation to intrinsic properties of CNM. Interestingly, DNA methylation and microRNA expression only partially explain the mechanism of action (MOA) of CNMs. Taken together, our results strongly support the implementation of approaches based on multi-omics screenings on multiple tissues/cell types, along with systems biology-based multi-variate data modelling, in order to build more accurate AOPs.

11.
Genes (Basel) ; 8(10)2017 Oct 20.
Artículo en Inglés | MEDLINE | ID: mdl-29053642

RESUMEN

Inherited retinal diseases (IRDs) are often associated with variable clinical expressivity (VE) and incomplete penetrance (IP). Underlying mechanisms may include environmental, epigenetic, and genetic factors. Cis-acting expression quantitative trait loci (cis-eQTLs) can be implicated in the regulation of genes by favoring or hampering the expression of one allele over the other. Thus, the presence of such loci elicits allelic expression imbalance (AEI) that can be traced by massive parallel sequencing techniques. In this study, we performed an AEI analysis on RNA-sequencing (RNA-seq) data, from 52 healthy retina donors, that identified 194 imbalanced single nucleotide polymorphisms(SNPs) in 67 IRD genes. Focusing on SNPs displaying AEI at a frequency higher than 10%, we found evidence of AEI in several IRD genes regularly associated with IP and VE (BEST1, RP1, PROM1, and PRPH2). Based on these SNPs commonly undergoing AEI, we performed pyrosequencing in an independent sample set of 17 healthy retina donors in order to confirm our findings. Indeed, we were able to validate CDHR1, BEST1, and PROM1 to be subjected to cis-acting regulation. With this work, we aim to shed light on differentially expressed alleles in the human retina transcriptome that, in the context of autosomal dominant IRD cases, could help to explain IP or VE.

12.
ACS Nano ; 11(4): 3786-3796, 2017 04 25.
Artículo en Inglés | MEDLINE | ID: mdl-28380293

RESUMEN

Understanding the complex molecular alterations related to engineered nanomaterial (ENM) exposure is essential for carrying out toxicity assessment. Current experimental paradigms rely on both in vitro and in vivo exposure setups that often are difficult to compare, resulting in questioning the real efficacy of cell models to mimic more complex exposure scenarios at the organism level. Here, we have systematically investigated transcriptomic responses of the THP-1 macrophage cell line and lung tissues of mice, after exposure to several carbon nanomaterials (CNMs). Under the assumption that the CNM exposure related molecular alterations are mixtures of signals related to their intrinsic properties, we inferred networks of responding genes, whose expression levels are coordinately altered in response to specific CNM intrinsic properties. We observed only a minute overlap between the sets of intrinsic property-correlated genes at different exposure scenarios, suggesting specific transcriptional programs working in different exposure scenarios. However, when the effects of the CNM were investigated at the level of significantly altered molecular functions, a broader picture of substantial commonality emerged. Our results imply that in vitro exposures can efficiently recapitulate the complex molecular functions altered in vivo. In this study, altered molecular pathways in response to specific CNM intrinsic properties have been systematically characterized from transcriptomic data generated from multiple exposure setups. Our computational approach to the analysis of network response modules further revealed similarities between in vitro and in vivo exposures that could not be detected by traditional analysis of transcriptomics data. Our analytical strategy also opens a possibility to look for pathways of toxicity and understanding the molecular and cellular responses identified across predefined biological themes.


Asunto(s)
Redes Reguladoras de Genes , Nanotubos de Carbono/química , ARN de Transferencia/genética , Animales , Línea Celular , Bases de Datos Genéticas , Femenino , Ratones , Ratones Endogámicos C57BL , Análisis de Secuencia por Matrices de Oligonucleótidos , Transcriptoma
13.
Neurology ; 87(1): 71-6, 2016 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-27281536

RESUMEN

OBJECTIVE: To apply next-generation sequencing (NGS) for the investigation of the genetic basis of undiagnosed muscular dystrophies and myopathies in a very large cohort of patients. METHODS: We applied an NGS-based platform named MotorPlex to our diagnostic workflow to test muscle disease genes with a high sensitivity and specificity for small DNA variants. We analyzed 504 undiagnosed patients mostly referred as being affected by limb-girdle muscular dystrophy or congenital myopathy. RESULTS: MotorPlex provided a complete molecular diagnosis in 218 cases (43.3%). A further 160 patients (31.7%) showed as yet unproven candidate variants. Pathogenic variants were found in 47 of 93 genes, and in more than 30% of cases, the phenotype was nonconventional, broadening the spectrum of disease presentation in at least 10 genes. CONCLUSIONS: Our large DNA study of patients with undiagnosed myopathy is an example of the ongoing revolution in molecular diagnostics, highlighting the advantages in using NGS as a first-tier approach for heterogeneous genetic conditions.


Asunto(s)
Distrofias Musculares/diagnóstico , Distrofias Musculares/genética , Estudios de Cohortes , Diagnóstico Diferencial , Femenino , Variación Genética , Humanos , Italia , Masculino , Análisis de Secuencia
14.
Nucleic Acids Res ; 44(12): 5773-84, 2016 07 08.
Artículo en Inglés | MEDLINE | ID: mdl-27235414

RESUMEN

The human retina is a specialized tissue involved in light stimulus transduction. Despite its unique biology, an accurate reference transcriptome is still missing. Here, we performed gene expression analysis (RNA-seq) of 50 retinal samples from non-visually impaired post-mortem donors. We identified novel transcripts with high confidence (Observed Transcriptome (ObsT)) and quantified the expression level of known transcripts (Reference Transcriptome (RefT)). The ObsT included 77 623 transcripts (23 960 genes) covering 137 Mb (35 Mb new transcribed genome). Most of the transcripts (92%) were multi-exonic: 81% with known isoforms, 16% with new isoforms and 3% belonging to new genes. The RefT included 13 792 genes across 94 521 known transcripts. Mitochondrial genes were among the most highly expressed, accounting for about 10% of the reads. Of all the protein-coding genes in Gencode, 65% are expressed in the retina. We exploited inter-individual variability in gene expression to infer a gene co-expression network and to identify genes specifically expressed in photoreceptor cells. We experimentally validated the photoreceptors localization of three genes in human retina that had not been previously reported. RNA-seq data and the gene co-expression network are available online (http://retina.tigem.it).


Asunto(s)
Proteínas del Ojo/genética , Redes Reguladoras de Genes , Genoma Humano , Proteínas Mitocondriales/genética , Retina/metabolismo , Transcriptoma , Adulto , Anciano , Empalme Alternativo , Atlas como Asunto , Mapeo Cromosómico , Exones , Proteínas del Ojo/metabolismo , Femenino , Perfilación de la Expresión Génica , Ontología de Genes , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Masculino , Persona de Mediana Edad , Proteínas Mitocondriales/metabolismo , Anotación de Secuencia Molecular , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Retina/citología
15.
Nucleic Acids Res ; 44(4): 1525-40, 2016 Feb 29.
Artículo en Inglés | MEDLINE | ID: mdl-26819412

RESUMEN

MicroRNAs play a fundamental role in retinal development and function. To characterise the miRNome of the human retina, we carried out deep sequencing analysis on sixteen individuals. We established the catalogue of retina-expressed miRNAs, determined their relative abundance and found that a small number of miRNAs accounts for almost 90% of the retina miRNome. We discovered more than 3000 miRNA variants (isomiRs), encompassing a wide range of sequence variations, which include seed modifications that are predicted to have an impact on miRNA action. We demonstrated that a seed-modifying isomiR of the retina-enriched miR-124-3p was endowed with different targeting properties with respect to the corresponding canonical form. Moreover, we identified 51 putative novel, retina-specific miRNAs and experimentally validated the expression for nine of them. Finally, a parallel analysis of the human Retinal Pigment Epithelium (RPE)/choroid, two tissues that are known to be crucial for retina homeostasis, yielded notably distinct miRNA enrichment patterns compared to the retina. The generated data are accessible through an ad hoc database. This study is the first to reveal the complexity of the human retina miRNome at nucleotide resolution and constitutes a unique resource to assess the contribution of miRNAs to the pathophysiology of the human retina.


Asunto(s)
MicroARNs/genética , Retina/metabolismo , Transcriptoma/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , MicroARNs/aislamiento & purificación , Epitelio Pigmentado de la Retina/metabolismo
16.
BMC Genomics ; 15 Suppl 3: S5, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25078076

RESUMEN

BACKGROUND: Mendelian disorders are mostly caused by single mutations in the DNA sequence of a gene, leading to a phenotype with pathologic consequences. Whole Exome Sequencing of patients can be a cost-effective alternative to standard genetic screenings to find causative mutations of genetic diseases, especially when the number of cases is limited. Analyzing exome sequencing data requires specific expertise, high computational resources and a reference variant database to identify pathogenic variants. RESULTS: We developed a database of variations collected from patients with Mendelian disorders, which is automatically populated thanks to an associated exome-sequencing pipeline. The pipeline is able to automatically identify, annotate and store insertions, deletions and mutations in the database. The resource is freely available online http://exome.tigem.it. The exome sequencing pipeline automates the analysis workflow (quality control and read trimming, mapping on reference genome, post-alignment processing, variation calling and annotation) using state-of-the-art software tools. The exome-sequencing pipeline has been designed to run on a computing cluster in order to analyse several samples simultaneously. The detected variants are annotated by the pipeline not only with the standard variant annotations (e.g. allele frequency in the general population, the predicted effect on gene product activity, etc.) but, more importantly, with allele frequencies across samples progressively collected in the database itself, stratified by Mendelian disorder. CONCLUSIONS: We aim at providing a resource for the genetic disease community to automatically analyse whole exome-sequencing samples with a standard and uniform analysis pipeline, thus collecting variant allele frequencies by disorder. This resource may become a valuable tool to help dissecting the genotype underlying the disease phenotype through an improved selection of putative patient-specific causative or phenotype-associated variations.


Asunto(s)
Exoma , Enfermedades Genéticas Congénitas/genética , Variación Genética , Anotación de Secuencia Molecular , Programas Informáticos , Biología Computacional/métodos , Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Enfermedades Genéticas Congénitas/diagnóstico , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación INDEL , Polimorfismo de Nucleótido Simple , Navegador Web , Flujo de Trabajo
17.
PLoS One ; 8(4): e60204, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23593174

RESUMEN

Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Genoma/genética , Análisis de Secuencia de ADN/métodos , Animales , Caenorhabditis elegans/genética , Escherichia coli/genética , Saccharomyces/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...